The purpose of this notebook is to demonstrate how one can use PyLogit to predict the probabilities of discrete outcomes. The two main use cases are when one has
Both of these uses will be demonstrated below. Essentially, the "predict" function of a model object is the place to go to perform both of these tasks. The exception to this is for mixed logit models, where in addition to the predict function, there is also a "panel predict" function that allows one to condition on past observations for individuals in one's prediction dataset that were also in one's estimation dataset. For more information on the panel_predict method, see chapter 11 ("Individual-Level Parameters") of Discrete Choice Methods with Simulation by Kenneth Train (2009).
The examples below should make all of this clear.
The dataset being used in this notebook is the "Swissmetro" dataset used in the Python Biogeme examples. The data can be downloaded at http://biogeme.epfl.ch/examples_swissmetro.html, and a detailed explanation of the variables and data-collection procedure can be found at http://www.strc.ch/conferences/2001/bierlaire1.pdf.
In [1]:
# For recording the model specification
from collections import OrderedDict
# For making plots pretty
import seaborn
# For file input/output
import pandas as pd
# For vectorized math operations
import numpy as np
# For plotting
import matplotlib.pyplot as plt
# For model estimation and prediction
import pylogit as pl
# To display plots inline
%matplotlib inline
In [2]:
# Load the dataset
raw_data = pd.read_csv("../data/long_swiss_metro_data.csv")
# Get a list of the choice situation ids that will be used
# in estimation and in prediction
all_situation_ids = np.sort(raw_data["custom_id"].unique())
# Set a random seed for reproducibility
np.random.seed(61)
# Shuffle and split the ids
np.random.shuffle(all_situation_ids)
prediction_ids = all_situation_ids[:2000]
estimation_ids = all_situation_ids[2000:]
# Create the estimation and prediction datasets
estimation_df = raw_data.loc[raw_data["custom_id"].isin(estimation_ids)].copy()
prediction_df = raw_data.loc[raw_data["custom_id"].isin(prediction_ids)].copy()
In [3]:
# Look at the first 5 rows of the data
# Note that for the mode_id, 1-Train, 2-SwissMetro, 3-Car
estimation_df.head().T
Out[3]:
As in previous examples, the model specification being used is $$ \begin{aligned} V_{i, \textrm{Train}} &= \textrm{ASC Train} + \\ &\quad \beta _{ \textrm{tt_transit} } \textrm{Travel Time} _{ \textrm{Train}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_train} } \textrm{Travel Cost}_{\textrm{Train}} * \left( GA == 0 \right) * 0.01 + \\ &\quad \beta _{ \textrm{headway_train} } \textrm{Headway} _{\textrm{Train}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{survey} } \left( \textrm{Train Survey} == 1 \right) \\ \\ V_{i, \textrm{Swissmetro}} &= \textrm{ASC Swissmetro} + \\ &\quad \beta _{ \textrm{tt_transit} } \textrm{Travel Time} _{ \textrm{Swissmetro}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_sm} } \textrm{Travel Cost}_{\textrm{Swissmetro}} * \left( GA == 0 \right) * 0.01 + \\ &\quad \beta _{ \textrm{headway_sm} } \textrm{Heaway} _{\textrm{Swissmetro}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{seat} } \left( \textrm{Seat Configuration} == 1 \right) \\ &\quad \beta _{ \textrm{survey} } \left( \textrm{Train Survey} == 1 \right) \\ &\quad \beta _{ \textrm{first_class} } \left( \textrm{First Class} == 0 \right) \\ \\ V_{i, \textrm{Car}} &= \beta _{ \textrm{tt_car} } \textrm{Travel Time} _{ \textrm{Car}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_car}} \textrm{Travel Cost}_{\textrm{Car}} * 0.01 + \\ &\quad \beta _{\textrm{luggage}=1} \left( \textrm{Luggage} == 1 \right) + \\ &\quad \beta _{\textrm{luggage}>1} \left( \textrm{Luggage} > 1 \right) \end{aligned} $$ Here, $i$ is used to denote a particular individual.
In [4]:
# NOTE: - Specification and variable names must be ordered dictionaries.
# - Keys should be variables within the long format dataframe.
# The sole exception to this is the "intercept" key.
# - For the specification dictionary, the values should be lists
# of integers or or lists of lists of integers. Within a list,
# or within the inner-most list, the integers should be the
# alternative ID's of the alternative whose utility specification
# the explanatory variable is entering. Lists of lists denote
# alternatives that will share a common coefficient for the variable
# in question.
basic_specification = OrderedDict()
basic_names = OrderedDict()
basic_specification["intercept"] = [1, 2]
basic_names["intercept"] = ['ASC Train',
'ASC Swissmetro']
basic_specification["travel_time_hrs"] = [[1, 2,], 3]
basic_names["travel_time_hrs"] = ['Travel Time, units:hrs (Train and Swissmetro)',
'Travel Time, units:hrs (Car)']
basic_specification["travel_cost_hundreth"] = [1, 2, 3]
basic_names["travel_cost_hundreth"] = ['Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train)',
'Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro)',
'Travel Cost, units: 0.01 CHF (Car)']
basic_specification["headway_hrs"] = [1, 2]
basic_names["headway_hrs"] = ["Headway, units:hrs, (Train)",
"Headway, units:hrs, (Swissmetro)"]
basic_specification["seat_configuration"] = [2]
basic_names["seat_configuration"] = ['Airline Seat Configuration, base=No (Swissmetro)']
basic_specification["train_survey"] = [[1, 2]]
basic_names["train_survey"] = ["Surveyed on a Train, base=No, (Train and Swissmetro)"]
basic_specification["regular_class"] = [1]
basic_names["regular_class"] = ["First Class == False, (Swissmetro)"]
basic_specification["single_luggage_piece"] = [3]
basic_names["single_luggage_piece"] = ["Number of Luggage Pieces == 1, (Car)"]
basic_specification["multiple_luggage_pieces"] = [3]
basic_names["multiple_luggage_pieces"] = ["Number of Luggage Pieces > 1, (Car)"]
In [5]:
# The 'alt_id_column' is the name of a column to be created in the long-format data
# It will identify the alternative associated with each row.
alt_id_column = "mode_id"
# The "obs_id_column" is a custom id column that ignores the fact that this is a
# panel/repeated-observations dataset. This column denotes each individual choice
# situation.
obs_id_column = "custom_id"
# The "choice_column" records the name of the column that denotes whether or not each
# individual chose the alternative on a given row.
choice_column = "CHOICE"
In [6]:
# Estimate the multinomial logit model (MNL)
swissmetro_mnl = pl.create_choice_model(data=estimation_df,
alt_id_col=alt_id_column,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=basic_specification,
model_type="MNL",
names=basic_names)
# Specify the initial values and method for the optimization.
swissmetro_mnl.fit_mle(np.zeros(14))
In [7]:
# Create the various specification and name dictionaries
# for the asymmetric logit model. It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
asym_specification = OrderedDict()
asym_names = OrderedDict()
for col in basic_specification:
if col != "intercept":
asym_specification[col] = basic_specification[col]
asym_names[col] = basic_names[col]
# Get the list of intercept names for the asymmetric logit model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
asym_intercept_names = basic_names["intercept"]
# Specify what alternative is not having its intercept estimated
asym_intercept_ref_pos = 2
# Give names to the shape parameters of the asymmetric logit model
# Note that all of the shape parameters are not identifiable so we
# need to restrict one of them. This accounts for why we do not have
# a "shape_car" name.
asym_shape_names = ["shape_train", "shape_swiss_metro"]
# Note the index of the alternative whose shape parameter is constrained
# (i.e. the Car alternative)
asym_ref = 2
# Create the asymmetric logit's model object
swissmetro_asym = pl.create_choice_model(data=estimation_df,
alt_id_col=alt_id_column,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=asym_specification,
model_type="Asym",
names=asym_names,
shape_names=asym_shape_names,
intercept_names=asym_intercept_names,
shape_ref_pos=asym_ref,
intercept_ref_pos=asym_intercept_ref_pos)
# Note that, below, we use None for initial values and use kwargs to
# specify our initial values for the optimization. This is not
# necessary, but it is convenient and an option.
# Note that dividing the index coefficients by log(J), where J is
# the number of alternatives in the dataset, accounts for the fact
# that when each shape parameter is 1/J, the value of the asymmetric
# logit model's estimated index coefficients are equal to the logit
# model's estimates, divided by log(J).
# Again, see http://arxiv.org/abs/1606.05900 for more details.
swissmetro_asym.fit_mle(None,
init_shapes=np.zeros(2),
init_intercepts=swissmetro_mnl.params.values[:2],
init_coefs=swissmetro_mnl.params.values[2:] / np.log(3))
In [8]:
# Create the various specification and name dictionaries
# for the uneven logit model. It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
uneven_specification = OrderedDict()
uneven_names = OrderedDict()
for col in basic_specification:
if col != "intercept":
uneven_specification[col] = basic_specification[col]
uneven_names[col] = basic_names[col]
# Get the list of intercept names for the uneven logit model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
uneven_intercept_names = basic_names["intercept"]
# Specify what alternative is not having its intercept estimated
uneven_intercept_ref_pos = 2
# Specify the names of the uneven logit model's shape parameters
# Note that we include "shape_car" because all of the uneven logit
# model's shape parameters are identifiable.
uneven_shape_names = ["shape_train", "shape_swiss_metro", "shape_car"]
swissmetro_uneven = pl.create_choice_model(data=estimation_df,
alt_id_col=alt_id_column,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=uneven_specification,
model_type="Uneven",
names=uneven_names,
shape_names=uneven_shape_names,
intercept_names=uneven_intercept_names,
intercept_ref_pos=uneven_intercept_ref_pos)
# Also, note that we use None for initial values and use kwargs to
# specify our initial values for the optimaztion. This is necessary
# to use 'outside' intercept parameters with the model.
swissmetro_uneven.fit_mle(None,
init_shapes=np.zeros(3),
init_intercepts=swissmetro_mnl.params.values[:2],
init_coefs=swissmetro_mnl.params.values[2:])
In [9]:
# Create the various specification and name dictionaries
# for the scobit model. It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
scobit_specification = OrderedDict()
scobit_names = OrderedDict()
for col in basic_specification:
if col != "intercept":
scobit_specification[col] = basic_specification[col]
scobit_names[col] = basic_names[col]
# Get the list of intercept names for the scobit model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
scobit_intercept_names = basic_names["intercept"]
# Create the names of the shape parameters that are needed for the scobit model
scobit_shape_names = ["shape_train", "shape_swiss_metro", "shape_car"]
# Specify which intercept/ASC is not being estimated (namely, the Car intercept)
scobit_intercept_ref = 2
swissmetro_scobit = pl.create_choice_model(data=estimation_df,
alt_id_col=alt_id_column,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=scobit_specification,
model_type="Scobit",
names=scobit_names,
shape_names=scobit_shape_names,
intercept_ref_pos=scobit_intercept_ref,
intercept_names=scobit_intercept_names)
# Note that we are using 'outside' intercept parameters for this model
swissmetro_scobit.fit_mle(None,
init_shapes=np.zeros(3),
init_intercepts=swissmetro_mnl.params.values[:2],
init_coefs=swissmetro_mnl.params.values[2:])
In [10]:
# Create the specification and name dictionaries
# for the clog-log model.It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
clog_specification = OrderedDict()
clog_names = OrderedDict()
# Copy the specification dictionary from the logit
# model, without the intercept parameters so we
# can place the intercept parameters outside the
# index
for col in basic_specification:
if col != "intercept":
clog_specification[col] = basic_specification[col]
clog_names[col] = basic_names[col]
# Get the list of intercept names for the clog-log model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
clog_intercept_names = basic_names["intercept"]
# Specify which intercept/ASC is not being estimated
# (i.e, the Car intercept)
clog_intercept_ref = 2
# Create the Clog-log model object
swissmetro_clog = pl.create_choice_model(data=estimation_df,
alt_id_col=alt_id_column,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=clog_specification,
model_type="Cloglog",
intercept_ref_pos=clog_intercept_ref,
names=clog_names,
intercept_names=clog_intercept_names)
# Estimate the clog log model. Note we don't pass one single array of
# initial values but instead pass keyword arguments
swissmetro_clog.fit_mle(None,
init_intercepts=swissmetro_mnl.params.values[:2],
init_coefs=swissmetro_mnl.params.values[2:])
In [11]:
# Specify the nesting specification.
# I.e. specify the name of the various nests,
# and which alternatives belong to which nests.
nest_membership = OrderedDict()
nest_membership["Future Modes"] = [2]
nest_membership["Existing Modes"] = [1, 3]
# Create the nested logit model object
swissmetro_nested = pl.create_choice_model(data=estimation_df,
alt_id_col=alt_id_column,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=basic_specification,
model_type="Nested Logit",
names=basic_names,
nest_spec=nest_membership)
# Create the initial values for the nest parameters
# Note: This should be in terms of the reparameterized values
# (i.e. the logit of the inverse of the scale parameter) used
# by PyLogit.
# Also, note that 40 corresponds to a scale parameter that is
# essentially equal to 1, and 5 corresponds to a scale
# parameter just slightly greater than 1.
init_nests = np.array([40, 5])
# Create the initial values for the estimation
initial_values = np.concatenate([init_nests,
swissmetro_mnl.params.values])
# Estimate the nested logit model
# Note we constrain the nesting parameter for nest 1 since
# the Future Modes nest only has 1 alternative in it.
swissmetro_nested.fit_mle(initial_values,
constrained_pos=[0])
swissmetro_nested.get_statsmodels_summary()
Out[11]:
Here, we'll estimate a relatively simple model, allowing for normally distributed sensitivities to travel time and travel cost. The distribution will be across the various individuals in the dataset(note, not across the choice situations of a given individual). The individuals are denoted by the "ID" column.
Given that we're estimating a distribution for sensitivities to time and cost, it makes sense to use a distribution with bounded support like a log-normal distribution. In this way, we could restrict the sensitivities to only being negative. However, Pylogit does not (yet!) support distributions other than the normal distribution for mixed logit.
In [12]:
# Specify the parameters, by name, that are being treated
# as random variables
mixing_variables = (basic_names["travel_time_hrs"] +
basic_names["travel_cost_hundreth"])
# Create the mixed logit model object
swissmetro_mixed = pl.create_choice_model(data=estimation_df,
alt_id_col=alt_id_column,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=basic_specification,
model_type="Mixed Logit",
names=basic_names,
mixing_id_col="ID",
mixing_vars=mixing_variables)
# Create the initial values for estimation.
# Note that we need one value for each beta
# coefficient and for each of the standard deviations
# that will now be estimated.
initial_values = np.concatenate([swissmetro_mnl.params.values,
np.zeros(len(mixing_variables))])
# Estimate the mixed logit model
swissmetro_mixed.fit_mle(initial_values,
seed=26,
num_draws=400)
swissmetro_mixed.get_statsmodels_summary()
Out[12]:
Here, we will simply predict the probabilities of each available alternative being chosen in each choice situation from the prediction dataset, using all of the models that we estimated above. Note that we will perform two predictions with the Mixed Logit model. One will use the standard "predict" function and the other will use the "panel_predict" function.
In [13]:
# Create an ordered dictionary mapping shortened model
# names to estimated model objects.
model_name_to_obj = OrderedDict()
model_name_to_obj["Clog"] = swissmetro_clog
model_name_to_obj["MNL"] = swissmetro_mnl
model_name_to_obj["Nested"] = swissmetro_nested
model_name_to_obj["Asym"] = swissmetro_asym
model_name_to_obj["Uneven"] = swissmetro_uneven
model_name_to_obj["Scobit"] = swissmetro_scobit
model_name_to_obj["Mixed"] = swissmetro_mixed
# Initialize a dictionary to store the log-likelihoods
# on the prediction dataset.
prediction_log_likelihoods = OrderedDict()
for key in model_name_to_obj:
prediction_log_likelihoods[key] = None
# Calculate and store the predictions for all models besides
# the mixed logit model. The predictions will be stored on
# prediction_df
for model_name in model_name_to_obj:
if model_name == "Mixed":
# Don't make predictions for the Mixed Logit Model
# using the for-loop
continue
else:
# Get the model object to be used in making predictions
model_obj = model_name_to_obj[model_name]
# Note that by default, the predict method returns the
# predicted probabilities for each available alternative
# for each choice situation.
prediction_array = model_obj.predict(prediction_df)
# We can use the "return_long_probs" and "choice_col"
# keyword arguments to ensure that we only return the
# probability of the chosen alternative for each
# choice situation.
chosen_predictions = model_obj.predict(prediction_df,
choice_col=choice_column,
return_long_probs=False)
# Calculate the predicted log-likelihood in two ways and,
# for demonstration, ensure that they agree.
log_likelihood_calc_1 = np.log(chosen_predictions).sum()
log_predictions = np.log(prediction_array)
choice_array = prediction_df[choice_column].values
log_likelihood_calc_2 = choice_array.dot(log_predictions)
assert np.allclose(log_likelihood_calc_1,
log_likelihood_calc_2)
# Store the log-likelihood
prediction_log_likelihoods[model_name] = log_likelihood_calc_1
# Create a column name for the predictions
prediction_col = model_name + "_Predictions"
# Store the predicted probabilities
prediction_df[prediction_col] = prediction_array
In [14]:
# Calculate and store the predictions for the mixed logit
# model using both the predict and panel_predict functions.
# Note that the panel_predict function will condition the
# predictions on the estimation data of each person
# (if there is any)
for prediction_type in ["Mixed", "Mixed_Panel"]:
if "Panel" not in prediction_type:
# We set the random seed simply for reproducibility
predictions = swissmetro_mixed.predict(prediction_df,
seed=2,
num_draws=400)
else:
# The arguments are ['data', 'num_draws']
args = [prediction_df, 400]
kwargs = {"seed": 2}
# Note that we can change the return type the same
# way as with the predict function. Pass
# return_long_probs = False and
# choice_col = choice_column as keyword arguments
predictions = swissmetro_mixed.panel_predict(*args,
**kwargs)
# Create the column name for the predictions
prediction_col = prediction_type + "_Predictions"
# Store the predicted probabilities
prediction_df[prediction_col] = predictions
# Calculate the log-likelihood
actual_choices = prediction_df[choice_column].values
log_predicted_probs = np.log(predictions)
prediction_log_likelihoods[prediction_type] =\
actual_choices.dot(log_predicted_probs)
In [15]:
# Look at the predicted log-likelihoods of the various models
pd.Series(prediction_log_likelihoods,
name="Log-likelihood of Predictions").sort(inplace=False)
Out[15]:
One feature of PyLogit's predict method is that it allows users to pass sets of parameters as a keyword argument (using "param_list"). This allows one to make multiple sets of predictions at once. Such a feature is useful when one has, for instance, a sample of MCMC samples from the posterior distribution of one's model parameters. Alternatively, as we shall see below, this feature is also useful when one wants to plot the log-likelihood function.
To demonstrate how one uses PyLogit to predict probabilities when one has 1 dataset and multiple sets of coefficients, we will simply use the Uneven Logit model. Most models in PyLogit support prediction with multiple sets of coefficients, and all models that support this feature make use of the same "param_list" keyword argument in the predict method.
The two times where prediction with multiple parameters is not suppored is in the Nested Logit model and in the "panel_predict" method of the Mixed Logit model. Note that the "standard" predict method of the Mixed Logit model does support prediction with multiple parameters.
In [16]:
swissmetro_uneven.get_statsmodels_summary()
Out[16]:
In particular, we want to plot the log-likelihood in the range of -1 to 0 for the travel cost parameter for the car utility. To do so, we will generate 100 values of between -1 and 0, containing the true parameter value.
In [17]:
# Create an array of 100 travel cost parameters between
# Note that the travel cost parameter of interest is
# the tenth parameter, and python is zero indexed so we
# use 9 to access its value
actual_travel_cost_param = swissmetro_uneven.params.iat[9]
low_travel_cost_params = np.linspace(-1,
actual_travel_cost_param,
num=50,
endpoint=False)
high_travel_cost_params = np.linspace(actual_travel_cost_param,
0,
num=50,
endpoint=True)
travel_cost_params = np.concatenate((low_travel_cost_params,
high_travel_cost_params))
Next, we'll create a 2D parameter array that contains all the sets of parameters. On the rows will be the particular parameters, and on the columns will be the sets of parameters.
In [18]:
# Initialize the 2D parameter array
param_array_2d = (swissmetro_uneven.params.values[:, None] *
np.ones(travel_cost_params.size)[None, :])
# "Fill in" the row for the travel cost parameter for cars
param_array_2d[9, :] = travel_cost_params
# Let's look at the array to make sure we created it correctly
param_array_2d[:, :4]
Out[18]:
Finally, PyLogit requires users to be explicit about the sets of parameters being passed as nest, shape, outside intercept, and index parameters. Note that index parameters are also referred to as utility coefficients. This level of explicitness is intended to help prevent errors, and it facilitates internal testing of the passed values.
To comply with the requirements of the method, we have to break apart the newly created parameter array into the four sets of parameters just mentioned: nest, shape, outside intercept, and index parameters.
In [19]:
# Since this is not a nested logit we have no nest parameters
nest_2d = None
# The uneven logit has 3 shape parameters so
# we select the first 3 rows.
shapes_2d = param_array_2d[:3, :]
# The uneven logit has 2 outside intercept parameters
# so we select the 4th and 5th rows
intercepts_2d = param_array_2d[3:5, :]
# Finally, the remaining parameters are the index coefficients
index_coefs_2d = param_array_2d[5:, :]
# Now, we pack all these arrays together in a list
# Note that the order is intentional and it is a requirement
# for the predict method.
parameter_list = [index_coefs_2d,
intercepts_2d,
shapes_2d,
nest_2d]
With the parameter list created, we can finally perform the prediction. To facilitate easy calculation of the log-likelihoods, given each set of parameter values, we will only return the chosen probabilities of each choice situation.
In [20]:
# Pack the keyword arguments into a dictionary, simply for
# line length considerations
kwargs = {"param_list": parameter_list,
"return_long_probs": False,
"choice_col": choice_column}
# Calculate the predictions for each set of parameter values
# Note that the returned array will have one column per
# set of parameter values and one row per choice situation
multiple_predictions = swissmetro_uneven.predict(estimation_df,
**kwargs)
# Calculate and plot the resulting log-likelihood
# Take the log of the probabilities of the chosen alternatives
# for each choice situation and sum them. log_likelihoods
# will be a 1D array of shape (travel_cost_params.size, )
log_likelihoods = np.log(multiple_predictions).sum(axis=0)
plt.plot(travel_cost_params,
log_likelihoods,
label="Log-Likelihood")
plt.vlines(actual_travel_cost_param,
log_likelihoods.min(),
swissmetro_uneven.log_likelihood,
linestyle='dashed',
label="Estimated Parameter Value")
plt.xlabel("Travel Cost, units: 0.01 CHF (Car)")
plt.ylabel("Log-Likelihood")
plt.title("Log-likelihod vs. Travel Cost Parameter for Car Utility")
plt.legend(loc='best')
plt.show()
In [ ]: